BRAKER2: automatic eukaryotic genome annotation with GeneMark-EP+ and AUGUSTUS supported by a protein database

نویسندگان

چکیده

Abstract The task of eukaryotic genome annotation remains challenging. Only a few genomes could serve as standards achieved through tremendous investment human curation efforts. Still, the correctness all alternative isoforms, even in best-annotated genomes, be good subject for further investigation. new BRAKER2 pipeline generates and integrates external protein support into iterative process training gene prediction by GeneMark-EP+ AUGUSTUS. continues line started BRAKER1 where self-training GeneMark-ET AUGUSTUS made predictions supported transcriptomic data. Among challenges addressed was generation reliable hints to protein-coding exon boundaries from likely homologous but evolutionarily distant proteins. In comparison with other pipelines annotation, is fully automatic. It favorably compared under equal conditions pipelines, e.g. MAKER2, terms accuracy performance. Development should facilitate solving harmonization genes different species. However, we understand that several more innovations are needed proteomic technologies well algorithmic development reach goal highly accurate genomes.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

BRAKER1: Unsupervised RNA-Seq-Based Genome Annotation with GeneMark-ET and AUGUSTUS

MOTIVATION Gene finding in eukaryotic genomes is notoriously difficult to automate. The task is to design a work flow with a minimal set of tools that would reach state-of-the-art performance across a wide range of species. GeneMark-ET is a gene prediction tool that incorporates RNA-Seq data into unsupervised training and subsequently generates ab initio gene predictions. AUGUSTUS is a gene fin...

متن کامل

Eukaryotic Genome Annotation Pipeline

The NCBI Eukaryotic Genome Annotation Pipeline is an automated pipeline producing annotation of coding and non-coding genes, transcripts, and proteins on finished and unfinished public genome assemblies. It provides content for various NCBI resources including Nucleotide, Protein, BLAST, Gene, and the Map Viewer genome browser. The pipeline uses a modular framework for the execution of all ann...

متن کامل

MitoFish and MitoAnnotator: A Mitochondrial Genome Database of Fish with an Accurate and Automatic Annotation Pipeline

Mitofish is a database of fish mitochondrial genomes (mitogenomes) that includes powerful and precise de novo annotations for mitogenome sequences. Fish occupy an important position in the evolution of vertebrates and the ecology of the hydrosphere, and mitogenomic sequence data have served as a rich source of information for resolving fish phylogenies and identifying new fish species. The impo...

متن کامل

MaGe: a microbial genome annotation system supported by synteny results

Magnifying Genomes (MaGe) is a microbial genome annotation system based on a relational database containing information on bacterial genomes, as well as a web interface to achieve genome annotation projects. Our system allows one to initiate the annotation of a genome at the early stage of the finishing phase. MaGe's main features are (i) integration of annotation data from bacterial genomes en...

متن کامل

Bovine Genome Database: integrated tools for genome annotation and discovery

The Bovine Genome Database (BGD; http://BovineGenome.org) strives to improve annotation of the bovine genome and to integrate the genome sequence with other genomics data. BGD includes GBrowse genome browsers, the Apollo Annotation Editor, a quantitative trait loci (QTL) viewer, BLAST databases and gene pages. Genome browsers, available for both scaffold and chromosome coordinate systems, displ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: NAR genomics and bioinformatics

سال: 2021

ISSN: ['2631-9268']

DOI: https://doi.org/10.1093/nargab/lqaa108